Limit aiohttp simultaneously opened connections to NVD #1093

imsahil007 · 2021-03-08T20:52:24Z

Pull request by @param211 #1091 can be an alternate solution for #1081.
But I manually tried limiting connections to NVD. And it seems like NVD allows a maximum limit of 20 per host.

connector = aiohttp.TCPConnector(limit_per_host=19)
            self.session = aiohttp.ClientSession(connector=connector, trust_env=True)

This worked for me every time.
I am just putting it here as a feasible solution @terriko
Also, the only files I changed is cvedb.py and test_cvedb.py (Other changes are related to isort)

imsahil007 · 2021-03-08T21:18:35Z

might fix #1081

param211 · 2021-03-09T06:49:46Z

Hey @imsahil007!

I think whats happening here is that decreasing the number of concurrent connections inadvertently decreases the number of requests made / time too.
Part of the reason why the downloads worked fine on our local machines (for most of us) while failing in CI could be because the CI env had far higher download speeds. Thus reaching the rate limit. If there was a host limit, the downloads on our local machines would've failed too. (I might be wrong here)
My guess is that limit_per_host=19 constraints the requests just enough so that CI's download speeds aren't able to breach the rate limit. However, if someone was to use even higher download speeds, each request will close faster (while remaining under the limit_per_host) and we would still breach the rate limit.

Point being that limit_per_host in conjunction with a particular download speed might cause the requests to stay under the rate limit, but with a different/higher download speed the rate limit would be breached anyway.

I would note however, that this is all based on conjecture, so I might be wrong ;)

imsahil007 · 2021-03-09T19:08:39Z

Hey @imsahil007!

I think whats happening here is that decreasing the number of concurrent connections inadvertently decreases the number of requests made / time too.

Part of the reason why the downloads worked fine on our local machines (for most of us) while failing in CI could be because the CI env had far higher download speeds. Thus reaching the rate limit. If there was a host limit, the downloads on our local machines would've failed too. (I might be wrong here)

My guess is that limit_per_host=19 constraints the requests just enough so that CI's download speeds aren't able to breach the rate limit. However, if someone was to use even higher download speeds, each request will close faster (while remaining under the limit_per_host) and we would still breach the rate limit.

Point being that limit_per_host in conjunction with a particular download speed might cause the requests to stay under the rate limit, but with a different/higher download speed the rate limit would be breached anyway.

I would note however, that this is all based on conjecture, so I might be wrong ;)

Yes, I am decreasing the number of simultaneous connections that make separate requests. So, it will decrease the requests per unit time as the connections are lower than the default value of 50.
I don't know about other users but I wasn't able to download NVD data even a single time using previous code( even using semaphores). I tried running cve bin tool at least 20 times. So, the issue wasn't related much to internet download speed than it was to parallel connections to increase the scraping speed.

You can say that limit_per_host=19 limits the request to a sweet spot but what it actually does is handle simultaneous connections which may vary accordingly to how a site is deployed and how requests are restricted.
You can read more about it here https://docs.aiohttp.org/en/stable/client_reference.html
I am not saying this is a better solution to the problem. I am just putting it here as a feasible solution and nothing else

terriko · 2021-03-09T23:11:18Z

We're having a bit of a discussion about which of the two currently working solutions to #1081 is going to be most robust before I choose one. But in the meantime, do you mind splitting the isort fixes into a separate pull request? We can definitely get those merged so they don't muddy up anyone else's pull requests!

terriko · 2021-03-11T19:30:02Z

Because #1091 introduces a new MIT-licensed component, I've got some additional paperwork to do before I can merge it to make sure we have appropriate license compliance and follow best practices for making sure the license is clear.

I'm going to merge this fix for now just so CI isn't broken, but I believe that the rate limit proposed in #1091 is useful (this PR is a connection limit) so we'll be integrating them to play well together after I deal with the licensing work.

imsahil007 mentioned this pull request Mar 10, 2021

isort fixes with profile black #1095

Merged

terriko mentioned this pull request Mar 11, 2021

Add rate limit for NVD downloads #1091

Closed

limiting tcp connections to nvd

952b0be

imsahil007 force-pushed the limit branch from e563097 to 952b0be Compare March 11, 2021 18:35

terriko merged commit b72e458 into intel:main Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit aiohttp simultaneously opened connections to NVD #1093

Limit aiohttp simultaneously opened connections to NVD #1093

imsahil007 commented Mar 8, 2021

imsahil007 commented Mar 8, 2021

param211 commented Mar 9, 2021

imsahil007 commented Mar 9, 2021

terriko commented Mar 9, 2021

terriko commented Mar 11, 2021

Limit aiohttp simultaneously opened connections to NVD #1093

Limit aiohttp simultaneously opened connections to NVD #1093

Conversation

imsahil007 commented Mar 8, 2021

imsahil007 commented Mar 8, 2021

param211 commented Mar 9, 2021

imsahil007 commented Mar 9, 2021

terriko commented Mar 9, 2021

terriko commented Mar 11, 2021